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Abstract. We want to reconstruct a signal based on inhomogeneous data (the amount of 
data can vary strongly), using the model of regression with a random design. Our aim is 
to understand the consequences of inhomogeneity on the accuracy of estimation within the 
minimax framework. Using the uniform metric weighted by a spatially-dependent rate as a 
benchmark for an estimator accuracy, we are able to capture the deformation of the usual 
minimax rate in situations with local lacks of data (modelled by a design density with 
vanishing points). In particular, we construct an estimator both design and smoothness 
adaptive, and a new criterion is developed to prove the optimality of these deformed rates. 



1. Introduction 

Motivations. A problem particularly prominent in statistical literature is the adaptive 
reconstruction of a function based on irregularly sampled noisy data. In several practi- 
cal situations, the statistician cannot obtain "nice" regularly sampled observations, be- 
cause of various constraints linked with the source of the data, or the way the data is 
obtained. For instance, in signal or image processing, the irregular sampling can be due 
to the process of motion or disparity compensation (used in advanced video processing), 
while in topography, measurement constraints are linked with the properties of the ground. 
See Feichtinger and Grochenig (1994) for a survey on irregular sampling, Almansa et al. 
(2003), Vazquez et al. (2000) for applications concerning respectively satellite image and 
stereo imaging, and Jansen et al. (2004) for examples of geographical constraints. 

Such constraints can result in potentially strong local lacks of data. Consequently, the 
accuracy of a procedure based on such data can become locally very poor. The aim of the 
paper is to study from a theoretical point of view the consequences of data inhomogeneity 
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on the reconstruction of a univariate signal. Natural questions arise: how does the inho- 
mogeneity impact on the accuracy of estimation? What does the optimal convergence rate 
become in such situations? Can the rate vary strongly from place to place, and how? 

The model. The widest spread way to model such observations is as follows. We model 
the available data [(JQ, 1^); 1 ^ i ^ n] by 



where £j are i.i.d. Gaussian standard and independent of the X^s and a > is the noise level. 
The design variables X, are i.i.d. with unknown density \i on [0, 1]. The more the density 
/i is "far" from the uniform law, the more the data drawn from (1.1) is inhomogeneous. A 
simple way to include situations with local lacks of data within the model (1.1) is to allow 
the density [i to be arbitrarily small at some points, and to vanish. This kind of behaviour 
is not commonly used in literature, since most papers assume fi to be uniformly bounded 
away from zero. We give references handling this kind of design below. 

In practice, we don't know /i, since it requires to know in a precise way the constraints 
making the observation irregularly sampled, neither do we know the smoothness of /. There- 
fore, a convenient procedure shall adapt both to the design and to the smoothness of /. 
Such a procedure (that is proved to be optimal) is constructed here. 

Methodology. We want to reconstruct / globally, with sup norm loss. The reason for 
choosing this metric is that it is exacting: roughly, it forces an estimator to behave well at 
every point simultaneously. This property is convenient here, since it allows to capture in 
a very simple way the consequences of inhomogeneity directly on the convergence rate. 

In what follows, a n < b n means a n ^ Cb n for any n, where C > 0. We say that a 
sequence of curves v n {-) is an upper bound over some class F if there is an estimator 
f n such that 



the [(Xi,Yi);l i ^ n], and where w(-) is a loss function, that is a non-negative and 
non-decreasing function such that w(0) = and w(x) ^ A(l + \x\ h ) for some A,b > 0. 

Literature. Pointwise estimation at a point where the design vanishes is studied in Hall et al 
(1997), with the use of a local linear procedure. This design behaviour is given as an ex- 
ample in Guerre (1999), where a more general setting for the design is considered, with 
a Lipschitz regression function. In Gaiffas (2005a), pointwise minimax rates over Holder 



Yt = fiX,) + a& 



(1.1) 



as n 
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classes are computed for several design behaviours, and an adaptive estimator for pointwise 
risk is constructed in Gaiffas (2005b). In these papers, it appears that, depending on the 
design behaviour at the estimation point, the range of minimax rates is very wide: from 
very slow (logarithmic) rates to very fast quasi-parametric rates. 

Many adaptive techniques have been developed in literature for handling irregularly sam- 
pled data. Among wavelet methods, see Hall et al. (1997) for interpolation; Antoniadis et al. 
(1997), Antoniadis and Pham (1998), Brown and Cai (1998), Hall et al. (1998), Wong and Zheng 
(2002) for tranformation and binning; Antoniadis and Fan (2001) for a penalization ap- 
proach; Delouille et al. (2001) and Delouille et al. (2004) for the construction of design- 
adapted wavelet via lifting; Pensky and Wiens (2001) for projection-based techniques and 
Kerkyacharian and Picard (2004) for warped wavelets. For model selection, see Baraud 
(2002). See also the PhD manuscripts from Maxim (2003) and Delouille (2002). 



2. Results 

To measure the smoothness of /, we consider the standard Holder class H(s,L) where 
s, L > 0, defined as the set of all the functions / : [0, 1] — > R such that 

\ f (ls])( x) _ f (lsl) iy) \ Vx,ye[0,l], 

where is the largest integer smaller than s. Minimax theory over such classes is standard: 
we know from Stone (1982) that within the model (1.1), the minimax rate is equal to 
(log ra/n) ,s /( 2 ' s+1 ) over such classes, when /i is continuous and uniformly bounded away from 
zero. If Q > 0, we define H®(s, L) := H (s, L) Pi {/ | ||/||oo ^ Q} (the constant Q needs not 
to be known). 

We use the notation := f T /i(t)dt. If F = H(s,L) is fixed, we consider the sequence 
of positive curves h n (-) = h n (-;F,fj,) satisfying 

LhJxY = a( — ■■ --Y^ (2.1) 

v ; \nn([x-h,x + h])J 

for any x € [0,1], and we define 

r n (x; F, (jt) := Lh n (x; F, /j,) s . 

Since h i— > h 2s fi([x — h, x + h]) is increasing for any x, these curves are well-defined (for n 
large enough) and unique. In Theorem 1 below, we show that r n (-) is an upper bound over 
Holder classes, and the optimality of this rate is proved in Theorem 2. 
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Example. When s = 1, a = L = 1 and fi(x) = A\x — l/2|l[ 0) i](x), solving (2.1) leads to 

r n (x) = (log n/n) " n(:r) , 
where the exponent a n {-) is given by 



a n [x 



log (((x-l/2) 4 +4 logn/n.) 1 / 2 -(x-l/2) 2 )-log2 



2 log(log n/n) 



Iog(2s-l) \ 



when xe [i-(*gp)i/*,i + 
whenxe [i+ (1^)1/4,1]. 



log(log n/n) > 

Within this example, r n (-) switches from one "regime" to another. Indeed, in this example 
there is a lack of data in the middle of the unit interval. The consequence is that r n (l/2) = 
(log n/n) 1 / 4 is slower than the rate at the boundaries r n (0) = r n (l) = (log n/n) 1 / 3 , which 
comes from the standard minimax rate (logn/n) S// ( 2s+1 ) with 5 = 1. We show the shape of 
this deformed rate for several sample sizes in Figure 1. 
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Figure 1. r n (-) and a n {-) for several sample sizes 



Upper bound. In this section, we show that the spatially-dependent rate r n (-) defined 
by (2.1) is an upper bound in the sense of (1.2) over Holder classes. The estimator used in 
this upper bound is both smoothness and design adaptive (it does not depend on the design 
density within its construction). This estimator is constructed in Section 3 below. Let R 
be a fixed natural integer. 

Assumption D. We assume that \i is continuous, and that whether fx(x) > for any x, or 
n{x) = for a finite number of x. Moreover, for any x such that fi(x) = we assume that 
My) = \v ~ x\@( x > for any y in a neighbourhood of x (where (3(x) ^ 0). 
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Theorem 1. Let s 6 (0, R + 1] and assumption D holds. The estimator f n defined by (3.2) 
satisfies 

supE //x [u;( sup r n (x)- l \f n {x) - f(x)\)] < 1 (2.2) 
feF see [0,1] 

as n — > +oo for any F = H®(s, L), where r n {-) = r n (-; F, //) is given by (2.1). 

This theorem assesses the adaptive estimator constructed in Section 3 below. The esti- 
mator f n is based on a precise estimation of the scaling coefficients (within a multiresolution 
analysis) of /. This method relies on a Lepski-type method (see for instance Lepski et al. 
(1997)) that we adapt for random designs. 

Remark. Within Theorem 1, there are mainly two situations. 

• fi(x) > for any x: we have r n (x) x (log n/n) s ^ 2s+1 ' for any x, where a n X b n 
means a n < b n and b n < a n . Hence, we find back the standard minimax rate in this 
situation. Note that this result is new since adaptive estimators over Holder balls 
in regression with random design were not previously constructed. 

• /jl(x) = for one or several x: the rate r n {-) can vary strongly from place to place, 
depending on the behaviour of fx. Indeed, the rate changes in order from one point 
to another, see the example above. 

Remark. Implicitly, we assumed in Theorem 1 that s 6 (0, R + 1], where R is a tuning 
parameter of the procedure. Indeed, in the minimax framework considered here, the fact 
of knowing an upper bound for s is usual in the study of adaptive methods, and somehow, 
unavoidable. For instance, when considering adaptive wavelet methods, the "maximum 
smoothness" corresponds to the number of moments of the mother wavelet. 

Optimality of r n (-). We have seen that the rate r n (-) defined by (2.1) is an upper bound 
over Holder classes, see Theorem 1. In Theorem 2 below, we prove that this rate is indeed 
optimal. In order to show that r n {-) is optimal in the minimax sense over some class F, the 
classical criterion consists in showing that 

inf supE /M [ W ( sup rnixy^Uix) - f(x)\)] > 1, (2.3) 
f n feF xe[o,x] 

where the infimum is taken among all estimators based on the observations (1.1). However, 
this criterion does not exclude the existence of another normalisation p n {-) that can improve 
r n {-) in some regions of [0, 1]. Indeed, (2.3) roughly consists in a minoration of the uniform 
risk over the whole unit interval and then, only over some particular points. Therefore, 
we need a new criterion that strengthens the usual minimax one to prove the optimality 
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of r n (-). The idea is simple: we localize (2.3) by replacing the supremum over [0, 1] by a 
supremum over any (small) inverval I n C [0,1], that is 

infsupE /At [ U ;(supr n (x)- 1 |/ n (x)-/(x)|)] > 1, V/„. (2.4) 

/„ feF xei n 

It is noteworthy that in (2.4), the length of the intervals cannot be arbitrarily small. Actu- 
ally, if an interval I n has a length smaller than a given limit, (2.4) does not hold anymore. 
Indeed, beyond this limit, we can improve 7Vt(*) for the risk localized over I n \ we can 
construct an estimator f n such that 

su P E /M [^(sup r^x)- 1 ^) - f(x)\)) = o(l), (2.5) 
feF x&i n 

see Proposition 1 below. The phenomenon described in this section, which concerns the 
uniform risk, is linked with the results from Cai and Low (2005) for shrunk L 2 risks. In 
what follows, \I\ stands for the length of an interval /. 

Theorem 2. Suppose that 

moo > i^r 1 (2.6) 

uniformly for any interval I C [0,1], where (3^0 and let F = H(s,L). Then, for any 
interval I n C [0, 1] such that 

\I n \ ~ n~ a (2.7) 

with a G (0, (1 + 2s + (3)~ l ), we have 

infsupE /M [ U ;( sup r n {xy l \j n (x) - f(x)\)} > 1 (2.8) 
f n feF xein 

as n — > +oo, where r n (-) = r n {- \F,fj) is given by (2.1). 

Corollary 1. If v n (-) is an upper bound over F = H(s,L) in the sense of (1.2), we have 

sup v n (x)/r n (x) > 1 

for any interval I n as in Theorem 2. Hence, r n (-) cannot be improved uniformly over an 
interval with length n £ ~ l ^ l+2s+ ^ , for any arbitrarily small e > 0. 

Proposition 1. Let F = H(s,L) and £ n be a positive sequence satisfying 

log£ n = o(logn). 

a) Let ii be such that < fJ,(x) < +oo for any x E [0, 1]. Note that in this case, r n {x) X 
(logra/n) ,S// ( 2 ' s+1 ) for any x £ [0,1] and that (2.6) holds with [3 = 0. If I n is an interval 
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satisfying 

|/n|~(Vn) 1/(1+2s) , 
we can contruct an estimator f n such that 

« n \s/(2s+l) ^ \-| 
SUp \f n {x) - f(x)\) =o(l). 
logn/ x£ln J J 

b) Let fi(xo) = for some xq G [0, 1] and h{[xq — h, xq + h]) = h@ +1 where (3^0 for any h 
in a neighbourhood of 0. // 

In = [so " (4/n) 1 /( 1+2 ^,x + (V«) 1/(1+Sto+/J) ], 
we can contruct an estimator f n such that 

supE /M [u>(sup r n (x)~ 1 \f n (x) - f(x)\)] = o(l). 

f£F XGln 

This proposition entails that r n (-) can be improved for localized risks (2.5) over intervals 
I n with size (£ n /n) 1 ^ 1+2s+ ^ where £ n can be a slow term such has (logn) 7 for any 7^0. 
A consequence is that the lower bound in Theorem 2 cannot be improved, since (2.8) does 
not hold anymore when I n has a length smaller than (2.7). This phenomenon is linked both 
to the choice of the uniform metric for measuring the error of estimation, and to the nature 
of the noise within the model (1.1). It is also a consequence of the minimax paradigm: it 
is well-known that the minimax risk actually concentrates on some critical functions of the 
considered class (that we rescale and place within /„ here, hence the critical length for /„), 
which is a property allowing to prove lower bounds such as the one in Theorem 2. 



3. Construction of an adaptive estimator 

The adaptive method proposed here differs from the techniques mentioned in Introduc- 
tion. Indeed, it is not appropriate here to apply a wavelet decomposition of the scaling 
coefficients at the finest scale since it is a L 2 -transform, while the criterion (1.2) consid- 
ered here uses the uniform metric. This is the reason why we focus the analysis on a 
precise estimation of the scaling coefficients. The technique consists in a local polynomial 
approximation of / within adaptively selected bandwidths for each scaling coefficient. 

Let (Vj)j^o be a multiresolution analysis of L 2 ([0, 1]) with scaling function <p compactly 
supported and i?-regular (the parameter R comes from Theorem 1), which ensures that 

||/-Pi/||oo<2-* (3.1) 
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for any / G H(s,L) with s G (0, R+ 1], where Pj denotes the projection onto Vj. We 
use Pj as an interpolation transform. Interpolation transforms in the unit interval are 
constructed in Donoho (1992) and Cohen et al. (1993). We have Pjf = Y^k=o a jk4>jk, 
where 4>jk(-) = 2^ 2 (p(2 : > ■ —k) and atjk = J f<fijk- We consider the largest integer J such 
that N := 2 J ^ n, and we estimate the scaling coefficients at the high resolution J. For 
appropriate estimators aj^ of aj^, we simply consider 

2 J -1 

fn ■= ^2 ®Jk<t>Jk- (3-2) 
fc=0 

Let us denote by PoIr the set of all real polynomials with degree at most R. If G PoIr 
is close to / over the support of 4> jk > then 

ajk = J f4>jk ~ / fk4>jk- 

When the scaling function <fi has R moments, that is 

J <J)(t)t p dt = l p=0 , p G {0, . . . , R}, (3.3) 

and when / is s-H61der for s G (0,i? + 1], accurate estimators of ajk are given by 

a Jk := 2- J / 2 f k (k2~ J ). (3.4) 

If 4> does not satisfies (3.3), j f4>jk can be computed exactly using a quadrature formula, in 
the same way as in Delyon and Juditsky (1995). Indeed, there is a matrix Qj (characterized 
by (/)) with entries (qjkm) for (k, m) G {0, . . . , 2 J — l} 2 such that 

J P<t>j k = 2~ J ' 2 qjk m P(m/2 J ) (3.5) 

mer Jfc 

for any P G Pol/j. Within this equation, the entries of the quadrature matrix Qj satisfy 

qjkm ^0 ^ \k-m\ ^ and m G T Jk , (3.6) 

where La, > is the support length of eft. Therefore, the matrix Qj is band-limited. For 
instance, if we consider the Coiflets basis, which satisfies the moment condition (3.3), we 
have qjkm = lfc=m> an d we can use directly (3.4). If the ((/>(• — k))k are orthogonal, then 
qjkm = 4>{ m ~ k), see Delyon and Juditsky (1995). 

For the sake of simplicity, we assume in what follows that <f> satisfies the moment condi- 
tion (3.3), thus ajk is estimated by (3.4). Each polynomial in (3.4) is defined via a least 
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square minimization which is localized within a data-driven bandwidth hence 

h = ft k) - 

Below, we describe the computation of these polynomials and then, we define the selection 
rule for the 

Local polynomials. The polynomials used to estimate each scaling coefficients are defined 
via a slightly modified version of the local polynomial estimator (LPE). This linear method 
of estimation is standard, see for instance Fan and Gijbels (f995, 1996), among many others. 
For any interval 5 C [0, 1], we define the empirical sample measure 



n 

in ^ — ^ 



n 
i=l 



where 1$ is the indicator of <5, and if fj, n (5) > 0, we introduce the pseudo-inner product 

(/, 9)5 ■= ~~7T\ I fgdfin, (3.7) 



8 
1/2 

and ||g||,5 := (g , g) s the corresponding pseudo-norm. The LPE consists in looking for the 
polynomial f( s > of degree R which is the closest to the data in the least square sense, with 
respect to the localized design-adapted norm || • l^: 

J® :=argmin||y- 5 ||l, (3.8) 
g&o\ R 

where we recall that Poljj is the set of all real polynomials with degree at most R. We 
can rewrite (3.8) in a variational form, in which we look for € PoIr such that for any 

(F*>,<p)6 = {Y,<p)6, (3.9) 

where it suffices to consider only power functions <p kp (.) = (• - k/2 J ) p , < p < R when 
estimating in a neighbourhood of the regular sampling point k/2 J . The coefficients vector 
9^ 6 R K+1 of the polynomial is therefore solution, when it makes sense, of the linear 
system 

where for ^ p,q ^ R: 

( X -k)p>9 := ( i Pkp, < Pkq)s and (Y^) p := (Y , ip kp ) 8 . (3.10) 

We modify this system as follows: when the smallest eigenvalue of Xj? (which is non- 
negative) is too small, we add a correcting term allowing to bound it from below. We 
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introduce 

:= Xf + (n/2„(5))" 1/2 Id i? , +1 l Cfc(5)C , 
where Id^+i is the identity matrix in and 

n k (5):={X(xi 5) )>(nfl n (5))- 1 / 2 }, (3.11) 

where A(M) stands for the smallest eigenvalue of a matrix M. The quantity (n/2 n (5)) -1 / 2 
comes from the variance of fjf\ and this particular choice preserves the convergence rate 
of the method. This modification of the classical LPE is convenient in situations with little 
data. 



Definition 1. When /i„,(<5) > 0, we consider the solution 6 k of the linear system 

X^ = Yf, (3.12) 



and introduce f£>(x) := (6 k °>) + (0 k >)i(x - k/2 J ) + ••• + {9 K k ') R (x - k/2 J ) H . When 



Hn{$) = 0, we take simply f k ^ := 0. 



Adaptive bandwidth selection. The adaptive procedure selecting the intervals A^ is 
based on a method introduced by Lepski (1990), see also Lepski et al. (1997), and Lepski and Spokoiny 
(1997). If a family of linear estimators can be "well-sorted" by their respective variances 
(e.g. kernel estimators in the white noise model, see Lepski and Spokoiny (1997)), the 
Lepski procedure selects the largest bandwidth such that the corresponding estimator does 
not differ "significantly" from estimators with a smaller bandwidth. Following this prin- 
ciple, we construct a method which adapts to the unknown smoothness, and additionally 
to the original Lepski method, to the distribution of the data (the design density is un- 
known). Bandwidth selection procedures in local polynomial estimation can be found in 
Fan and Gijbels (1995), Goldenshluger and Nemirovski (1997) or Spokoiny (1998). 

The idea of the adaptive procedure is the following: when is close to / (that is, when 
5 is well-chosen), we have in view of (3.9) 

{f -(8') _ f -(S) ^ = {Y _ f -(8) ) ^ ^(Y-f, <p)y = , <p) S > 

for any 5' C 5, ip G Pol#, where the right-hand side is a noise term. Then, in order to 
"remove" this noise, we select the largest 5 such that this noise term remains smaller than 
an appropriate threshold, for any 5' C 5 and tp = (fk p , p£ {0, . . . , R}. The bandwidth A^ 
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is selected in a fixed set of intervals G k called grid (which is defined below) as follows: 
A k := argmax U n (5) \ W eG k ,5' C 6, Wp G {0, . . . , R}, 



where 



losn \ 1 / 2 /log(n/i n ,(5))\ i/ 2 



T n (5,5'):=a + DCrI _ ™ j , (3.14) 

with C R := 1 + (R + l) 1 / 2 and D > (2(6 + 1)) 1/2 , if we want to prove Theorem 1 with a 
loss function satisfying w(x) < (1 + \x\ b ). The threshold choice (3.14) can be understood 
in the following way: since the variance of fjp is of order {njx n {5))~ 1 ^ 2 , we see that the 
two terms in T n (5, 5') are ratios between a penalizing log term and the variance of the 
estimators compared by the rule (3.13). The penalization term is linked with the number 
of comparisons necessary to select the bandwidth. To prove Theorem 1, we use the grid 

G k := \J {[k2~ J - \Xi-k2- J \,k2- J + \Xi-h2 J \]}, (3.15) 

and we recall that the scaling coefficients are estimated by 

S Jfc := 2-^ff*\k2- J ). 

Remark. In this form, the adaptive estimator has a complexity 0{n 2 ). This can be de- 
creased using a smaller grid. An example of such a grid is the following: first, we sort 
the [Xi,Yi) into (X^,Y^) such that X^ < X( i+ iy Then, we consider i{k) such that 
k/2 J G (if necessary, we take X^ = and X( n+1 ) = 1) and for some 

a > 1 (to be chosen by the statistician) we introduce 

pog (i(fc)+l)] [log>-i(fe))] 
G k ■= |J |J |[X (i ( fc ) +1 _[ a p]),X( i(fc)+ [ a9 ])]|. (3.16) 

p=0 q=0 

With this grid, the selection of the bandwidth is fast, and the complexity of the procedure 
is 0(n(logn) 2 ). We can use this grid in practice, but we need extra assumptions on the 
design if we want to prove Theorem 1 with this grid choice. 

4. Proofs 

We recall that the weight function w(-) is non-negative, non-decreasing and such that 
w(x) ^ A(l + \x\) b for some A, b > 0. We denote by fi n the joint law of X±, . . . , X n and X n 
the sigma-field generated by X\, . . . ,X n . \A\ denotes both the length of an interval A and 
the cardinality of a finite set A. M T is the transpose of M, and £ = (£i, . . . ,£ n ) T - 
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Proof of Theorem 1. To prove the upper bound, we use the estimator defined by (3.2) 
where <f> is a scaling function satisfying (3.3) (for instance the Coiflets basis), and where 
the scaling coefficients are estimated by (3.4). Using together (3.1) and the fact that 
r n{x) > (logn/n) s /( 1+2s ) for any x, we have sup xG [ 01 ] r n (xy 1 \\f - Pjf \\oo = o(l). Hence, 

2 J -1 

sup r n (x)~ 1 |/ n (x) - f(x)\ < sup rnix)' 1 ] (a Jk - a Jk )4>jk(x) 
xe[o,i\ xe[o,i] 1 k=Q 

^ max sup r n 

a Jk - ajk | , 

where Sk denotes the support of <f)j k - Then, expanding / up to the degree [s\ *S R and 
using (3.3), we obtain 

sup r n (x) _1 |/ n ,(a;) - f{x)\ < max sup r n (xy 1 \f i k Ak) (x k ) - f(x k )\. (4.1) 
ie[o,i] o<ki& J -l xeS k 

Since \S k \ = 2~ J X n" 1 , we have 

sup r n (x) _1 < TnixkY 1 . (4.2) 
xes k 

Indeed, since \x is continuous, r n (-) is continuously differentiable and we have sup xgSfe |r n (x) _1 
r n {xk)~ l \ ^ 2~ J ||(r~ 1 ) / || 00 , where g' stands for the derivative of g. Moreover, [(r^x)" 1 )'! < 
ti n {x)h n {x)-( s+l "> < n _1 , since h' n (x) < 1 and h n (x) > {\ogn/n) l ^ 2s+l \ thus (4.2). 

In what follows, || • ||oo denotes the supremum norm in ]R^ +1 . The following lemma 
is a version of the bias-variance decomposition of the local polynomial estimator, which 
is classical: see for instance Fan and Gijbels (1995, 1996), Goldenshluger and Nemirovski 
(1997), Spokoiny (1998), among others. We define the matrix 

E f ; =A«>XfA« 
where X fc is given by (3.10) and := diag[||99 fc0 H^ 1 , . . . , H^fciill^ 1 ]- 
Lemma 1. Conditionally on X n , for any f S H(s,L) and S 6 Gk, we have 

| jf (x fc ) - f(x k )\ < A(E^)- 1 (L|5| S + ain^yV^kU) 
on 0^(5), where TJjp is a X n -measurable matrix of size (R + 1) x (nfl n (5)) satisfying 

uf(ulV = i<W 

Note that within Lemma 1, the bandwidth 5 can change from one point Xk to another. 
We denote shortly U fc := U^ fc) . Let us define W := U£ where U := (Uj, . . . , Ujj) T . 
In view of Lemma 1, W is conditionally on X n a centered Gaussian vector such that 
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E /M [Wf |3£ n ] = 1 for any k E {0, . . . , (R+1)2 J }. We introduce W N := max ^ (im)2J \W k \ 
and the event W N := {\W N - ¥,[W N \3L n ]\ ^ Lwilogn) 1 / 2 }, where L w > 0. We recall 
the following classical results about the supremum of a Gaussian vector (see for instance 
in Ledoux and Talagrand (1991)): 

E f ,[W N \X n ] <(logiV) 1 /2< ( i g n) i/2 ) 

and 

P ffl [W C N \X n ] < exp(-L 2 w (logn)/2) = n~ L w/ 2 . (4.3) 
Let us define the event 

T fc := {ji n (A k ) < /2 n (A fc )} 
and Rk '■= -°f a n ) 1 ^ 2 where the intervals are given by 

A k :=argmax{/2 n (5) | L|5| s ^ a (-^-) l ' 2 \. 

There is an event S n G 36 n such that ^ n [S^] = o(l) faster than any power of n, and such 
that Rk x r n (x k ) and A(E^ Afe ^) > 1, uniformly for any k G {0, . . . ,2 J — 1}. This event is 
constructed below. We decompose 

\fi Ak \xk) ~ f(x k )\ ^A k + B k + C k + D k , 

where 



A k 


:= \ff k \xk) 




B k 


■= lif k \*k) 


- f{xk) ^T^nWjynSn 


c k 


■= \ff k \xk) 


-/i Afc) (^)|lT fe ns ?l : 


D k 


■■= lif fc w 


- f(x k )\lyv N nS n - 


wc 


have 





l/?Wl < (nAn(5)) 1 / 2 ||/||oo(l + W N ). (4.4) 
This inequality is proved below. Using (4.4), we can bound 

E/ M [w ( max^ r n (x k ) " 1 1 f { k Afc } (x k ) \ ) \ X n ] 
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by some power of n. Using ||/||oo ^ Q together with the fact that L\y can be arbitrarily 
large in (4.3) and since fJ. n [S^\ = o(l) faster than any power of n, we obtain 

EfJw( max r n (x k )~ l A k )] = o(l). 

0<Jfc<2 J 

Term D k . Using together Lemma 1, the definition of A. k and the fact that W N < (logra) 1 / 2 
on Wn, we have 

|/f fc) (x fc ) - f(x k )\ < A(Ef fc) )- 1 i? fe (l + (Iogn)-V2^) < A(Ef fc) )-V„(x fc ) 
on Wat n S n , thus 

E fAt U( max r n {x k )~ l D k )] < 1. 
Term C^. We introduce G k (5) := {(5' E C 5} and the following events: 

T k (5,S',p) := {\(f® - fP , <p kp ) 5 ,\ ^ o-\\ip kp \\s'T n (S,S')}, 
Tk{S,S') := n <:p^RT k (5,5'), 

By the definition (3.13) of the selection rule, we have T k C T k (A k , A k ). Let 5 E G k ,5 f E 
G k (5). On Tk(S,6') n 0fc(<5') we have (see below) 

l/fW-rKJISMEf')-^) 1 ' 2 . (4,) 

Thus, using (4.5), we obtain 

Efu[w( max r re (a?jfc) _1 C fc )] < 1. 

Term B k . By the definition (3.13) of the selection rule, we have Tjjj C T k (A k )^. We need 
the following lemma. 



w«.(-^y /2 (4.6) 



Lemma 2. If 6 € G k satisfies 

and f E H(s,L), we have 

P fti [T k (Sf\X n ) ^ {R + l^nMS)) 1 - 02 ' 2 
on Q k (5), where D is the constant from the threshlod (3.14). 
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Using together Lemma 2, ||/||oo ^ Q and (4.4), we obtain 



-li fi^k) 

thus 



E/ At [io( max r n (a: fc ) ^fc)] < 1, 
and Theorem 1 follows. □ 



Proof of Lemma 1. On tt k (5), we have xjf 3 = Xf, and A(X^ } ) > (n/x n (5))~ 1 / 2 > 0, thus 
xjjf > and E[ S) are invertible. Let be the Taylor polynomial of / at x k up to the order 
and 9k G be the coefficient vector of fk- Using / G H(s,L), we obtain 

lif(z fc ) - f(x k )\ < KCaW)" 1 ^ - fc ) , ei )| + 

= K(E?)- 1 A?X?(^-^),e 1 )| + |C. 
In view of (3.9), we have on fifc(<5) for any p G {0, . . . , R}: 

PtS i 0S ) -Ok)) P = (li Si -fk,<Pkr)6 

= (Y - fk, <£kp)5 

thus, X.f{0f-e k ) = B { k 5) + V k (S) where {B®) p := (f-f k ,<Pkp)s and (U fc ( \ := (£, <^ p ) 5 , 
which correspond respectively to bias and variance terms. Since / G H(s, L) and A(M) -1 = 
||M _1 || for any symmetrical and positive matrix M, we have 

K(E?r 1 A?<,e 1 )|<A(Ef)^|5| S - 

Since (vi^)p = (n/2 n (<$)) _1 Dj^£ where is the (R + 1) x (n/l„(<5)) matrix with entries 
(D^) ijP := (Zj - Xfc) p , Xj G 5, we can write 

K(E? ^AWvrW , ei) ,| < (T (n / 2 ri («5))-^||(EW)-i/ 2 |||| U W e || 0O , 

where uf := (n// n (5))- 1 /2( E W)-i/2 A (' 5 ) D W satisfies (uf) T = Id^ since E^ = 
A^xf and xjf 5 = (n^tf))"^^ (D^) T , thus the lemma. □ 



Proof of (4.4). If p, n (5) = 0, we have = by definition and the result is obvious, 
thus we assume ji n {5) > 0. Since A(xj^) ^ (nji n (5))~ l l 2 > 0, xj^ and are invertible 
and E^ also is. The proof of (4.4) is then similar to that of Lemma 1, where the bias is 
bounded by ||/||oo and where we use the fact that A(xl ) ^ (n/i n ((5))~ 1 / 2 to control the 
variance term. □ 
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Proof of (4.5). Let us define := Aj^xjfl On Q k (S), we have: 

\fk%*) - ifV)l = \(€ - el 5 \\ < A(Ef V^Hf >(*?> - ^)|U. 

Since on tt k (8'), (H^\e^ - of ) )) p = {jf* - fjf\ <Pkp)8' /\\<Pk P \\s' , and since 5' C 6, we 
obtain (4.5) on % (6, 5'). □ 



Proof of Lemma 2. We denote by Pi the projection onto Spanjy^o, . . . , tp k u) with 
respect to the inner product (• , •)§. Note that on Q k (5), we have fjp = P^'Y. Let 5 G G k 
and 5' G Gk{5). In view of (3.9), we have on f2fc(£) for any 99 = </>fc p , p G {0, . . . , R}: 

(if 3 - if , = <y - if , 

= (/-pfy,^ + (e,^ 

= A k — B k + Cfc, 

where A fc := (/ - Pjj/ / , ^)<5', ^fc := (7 ( P i <5) C , and C k := <r(£ , If f k is the Taylor 

polynomial of / at x k up to the order [s\ , since 5' <Z 5 and / G i? (s, L) we have: 

\A k \ < I^IUHI/ - / fe + pf(A - < ||/ - Mis < y\WL\s\ s , 

and using (4.6), we obtain \A k \ < ||y||g/o-( n 1 ? g ^ ) 1 ^ 2 . Since P^ is an orthogonal projection, 
the variance of B k is equal to 

CT 2 E /M [(pf ^)I^n] ^^ii^niE^fiip^eiili^] 

= a 2 ||^||2,Tr(P^)/(n/i n (5')), 

where Tr(M) stands for the trace of a matrix M. Since Pj^ is the projection onto Pol/j, 
Tr(Pj; 5) ) < R+ 1, and the variance of B k is smaller than c^ 2 1 1 1 1 ^ (-^ + l)/(nfin($'))- Then, 

E /M [(S + C) 2 |£ n ] < ^II^H 2 , C£/(n/2 n (<J')). (4.7) 

In view of the threshold choice (3.14), we have 

{\(fT-fT^)s'\>Ms'T n (5,5')} 

„J> \B k + C k 

■ a 



W\\S' \ B k + C k \ , . _ /«x\l/2\ 



UNIFORM ESTIMATION OF A SIGNAL BASED ON INHOMOGENEOUS DATA 17 

and using (4.7) together with P[|JV(0,1)| > x] < exp(-x 2 /2) and \G k {8)\ ^ (n/2 n (<5)), we 
obtain 

R 

P ffl [T(5f\X n ] < £ ^exp(-£ 2 log(n/i n (<5))/2) 

<5'eG' fc (<5) P=0 

^(iz + lxn^)) 1 ^ 2 / 2 , 

which concludes the proof. □ 

Construction of S n . We construct an event S n G 3L n such that // n [S^] = o(l) faster than 
any power of n, and such that on this event, x r n (xk) and A(E[ Afe ' ) ) > 1 uniformly for any 
k G {0, . . . , 2 J }. We need preliminary approximation results, linked with the approximation 
of [i by Jl n . The following deviation inequalities use Berstein inequality for the sum of 
independent random variables, which is standard. We have 



1 



for any interval 5 C [0, 1] and e G (0, 1). Let us define the events 

1 f (■ — x\ a 



exp ( - e 2 nn(S)) (4.8) 



B n}a( x ^) ■={ -7K / (~^r) dfi n -e a (x,n) 



where e a (x,fj,) := (1 + (— l) a )(/3(x) + l)/(a + /3(x) + 1) (a is a natural integer) where we 
recall that (3(x) comes from assumption D (if x is such that n{x) > then /3(x) = 0). Using 
together Bernstein inequality and the fact that 

as \5\ — > 0, we obtain 

//* [(Dg) tt (x, e)) C ] < exp ( - e 2 n^8)) . (4.9) 
By definition (3.15) of Gf., we have = [xj- — H n (xk),Xk + H n (xk)] where 

logn 

111 j IjIV ft <7\ n 

is an approximation of h n (x) (see (2.1)). Since p, n is "close" to /i, these quantities are 
close to each other for any x. Indeed, if 5 n (x) := [x — h n (x),x + h n (x)] and A n (x) := 
[x — H n (x),x + H n (x)] we have using together (4.10) and (2.1): 

•/Zn[ (l + e)S n ( x)} 
fi[5 n (x)\ 

for any e G (0, 1), where (1 + e)5 n (x) := [x — (1 + e)h n (x),x + (1 + e)h n (x)]. Hence, for 
each x = Xk, the left hand side event of (4.11) has a probability that can be controlled 



H n (x) := argminjL^ > a( _ ^ ^ ,,X /2 \ (4.10) 
fte[o,i] 1 vn// n ([x - /i,x + h\)J ) 



{H n (x) < (1 + e )M*)} = { ^w^ > (1 - ^} ( 4 - n ) 
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under assumption D by (4.8), and the same argument holds for {H n (x) > (1 — e)h n (x)}. 
Combining (4.8), (4.9) and (4.11), we obtain that the event 



B, 



a (x,e) := { _ 7-r— -, rr / ( , t^t ) dy n - e a (x,y) 



Hn(An(x)) J An(x) \ \5 n (x)\. 

satisfies also (4.9) for n large enough. This proves that (X^ Afc ^) Pj(? and (A^ k ^) p are close to 
e p+q (xk, y) and e2 P (xk, y)~ 1 ^ 2 respectively on the event 

S„ := P| P| B n:a (x k ,e). 

ae{0,...,2R} fce{0,...,2 J -l} 

Using the fact that A(M) = inf | x ||=i x T Mx for a symmetrical matrix M, where A(M) 
denotes the smallest eigenvalue of M, we can conclude that for n large enough, 

A(A (A fc ) x (A fc ) A (A fe)) > n A(E(x >M )), 

x6[0,l] 

where E(x,pi) has entries (E(x,fi)) p>q = e p+q (x, fi) / (e2 P (x, fi)e2 q (x, y)) 1 ^ 2 ■ Since E(x,/i) is 
definite positive for any x £ [0, 1], we obtain that on S n , A(x[, Afe ^) > 1, thus S n C O n (Afc) 
and A(E<; Afc) ) > 1 uniformly for any k £ {0, . . . , 2 J - 1}, since E^ Afc) = A^Xj^A^ 
on fi n (Afc). Moreover, since Rk = LH n (xk) s , using together (4.8) and (4.11), we obtain 
Rk ~ r n(xk) uniformly for k £ {0, . . . , 2 — 1}. □ 



Proof of Theorem 2. The main features of the proof are first, a reduction to the Bayesian 
risk over an hardest cubical subfamily of functions for the L°° metrics, which is standard: 
see Korostelev (1993), Donoho (1994), Korostelev and Nussbaum (1999) and Bertin (2004), 
and the choice of rescaled hypothesis with design-adapted bandwidth h n (-), necessary to 
achieve the rate r n {-). 

Let us consider (p £ H(s,L;M) (the extension of H(s,L) to the whole real line) with 
support [—1, 1] and such that <^(0) > 0. We define 

. r / 2 / 1 \\V(2sh 
a := mm 1, - — ^-1 — a) ) 

and 

E n :=2a(l + 2 1 /C«-W)) sup ^(x), 

xe[o,i] 

where we recall that [s\ is the largest integer smaller than s. Note that (2.6) entails 

~ n < (logn/n) 1 /^ 2 ^). (4.12) 
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If J n = [c n , d n ], we introduce x k := c n + k E n for k G -ftT n := {l, . . . , [|/ n | H" 1 ] }, and denote 
for the sake of simplicity hf, := h n {x k ). We consider the family of functions 



which belongs to H(s,L) for any 8 G [—1,1]'^™'. Using Bernstein inequality, we can see 
that 

' J I u(\xh — hk.Xh + hie]) ^ J 



k€K n 

satisfies 



M n [H n ] = 1 - o(l). (4.13) 

Let us introduce b := c s <p(0). For any distribution B on n C [—1, 1]'^"', by a minoration 
of the minimax risk by the Bayesian risk, and since w is non-decreasing, the left hand side 
of (2.8) is smaller than 

w(b) inf f P£ [ max \6 k - O k \ ^ l] B(dB) 

e Jon k€Kn 

^ w(b) [ inf / Pg[ max \9 k - O k \ > l\X n ]B (d9) d fi n . 
Hence, together with (4.13), Theorem 2 follows if we show that on H„ 

sup/ P£[max|£ fc -0 fc | <l|£ n ]B(d0) = o(l). (4.14) 

We denote by L(6; Y\, . . . , Y n ) the conditional on 3£. n likelihood, function of the observations 
Yi from (1.1) when /(•) = /(•; 0). Conditionally on X n , we have 

L(9;Y 1 ,...,Y n )= J] 9*0Q II 
where g v is the density of N(0,v 2 ), v\ := E{y^|X n } and 



Ilk 



k ■- ^\yk\ 
Zti Y ifk(Xi) 



Thus, choosing 

B:=(g)b, b := (<5_i + <5i)/2, G n := {-1, 1}^, 
the left hand side of (4.14) is smaller than 

Ui^n aA^i) / r gup /" x _ 5 ( yfe _ g fc )b(de fc ))dyi x • • • x dY n , 

UkeK n 9v k (yk)^ k ^ n dk J {-1,1} 1 fc fcl 7 
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and 6k = ly fe >o — ly k <o are strategies reaching the supremum. Then, in (4.14), it suffices 
to take the supremum over estimators 6 with coordinates 6 k £ {—1,1} measurable with 
respect to yk only. Since conditionally on X n , yk is in law N(9k,v^.), the left hand side 
of (4.14) is smaller than 

II f 1 "- i r nf / / 1 \d k (u)-e k \^i9v k (u-0k)dub{de k )). 
keK n e k e{-i,i}J{-i,i}J 1 kK ' fcl " ' 

Moreover, if <E>(x) := gi(t)dt 



l nf ,/ \e k {u)-e k \^k{ u - e k)duh{de k ) 

E{-1,1} ^{-1,1} J 

min {g Vk (u - l),g Vk (u + l))du = <&(-l/v k ). 



On H n , we have in view of (2.1) 



a 2 . 2 



-oo 



Er=i/K^) (l-5)|N2o^logn' 
and since <&(— x) exp(— x 2 /2)(xv / 2vr) for any x > 0, we obtain 

*(-!/«*) > (logn)" 1 /2 n {-i/(i + 2^)}/2 =: Ln 

Thus, the left hand side of (4.14) is smaller than (1 — L n )\ Kn \, and since 

IInlE-% > „{l/(l+2.+/J)-«}/2 (Iogn) l/2-l/(l+2^ ^ +c 

as n — > +oo, Theorem 2 follows. □ 

Proof of Corollary 1. Let us consider the loss function w(-) = \ ■ |, and let be an 

estimator converging with rate v n (-) over F in the sense of (2.2). Hence, 

1 < supE /M [ sup r n (x)- x \fi(x) - /(x)|] 
feF xe/„ 

^ sup ^j^- supE/^ sup v n (x)~ l \Jl(x) - f(x)\) < sup ^4~T' 
xein r n{x) feF xei n xei n r n {x) 

where we used Theorem 2. □ 

Proof of Proposition 1. Without loss of generality, we consider the loss w(-) = \ ■ |. For 
proving Proposition 1, we use the linear LPE. If we denote by d m f the m-th derivative of 
/, a slight modification of the proof of Lemma 1 gives for / G H(s, L) with s > m, 

\d m f { k\xk) - d m f(x k )\ < A(Ef r 1 !^-™ W +<j(nfi n (5))- 1 / 2 W N ), 
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where in the same way as in the proof of Theorem 1, W N satisfies 

E/^lX^logAO 1 / 2 , (4.15) 

with N depending on the size of the supremum, to be specified below. First, we prove a). 
Since \I n \ ~ {£ n /n) l ^ 2s+l \ if I n = [a n ,b n ], the points 



x k :-- 



a n + {k/n) 1 ^ 2s+1 \ ke{0,...,N}, 



where N := [£ n ] belongs to I n . We consider the bandwidth 



/ ! „=^!^) I/<2 * +1, , (4.i 6) 



n 

and we take 5 k '■= [x k — h n ,Xk + h n ]. Note that since fi(x) > for any x, fi n {<>) ~ \S\ as 
\S\ — ► with probability going to 1 faster than any power of n (using Berstein inequality, 
for instance). We consider the estimator defined by 



f n (x) := Y,d m fi Sk \x k )(x-x k ) m /ml for x G [x k , x k+1 ), k G {0, . . . , [£ n ]}, (4.17) 



where r := [s\ . Using a Taylor expansion of / up to the degree r together with (4.16) gives 



(n/logn)'/* 1 **) sup \J n {x) - f(x)\ < (^M s/(1+2s) (1 + { l og i n )-^W N ). 
xei n V log n / 

Then, integrating with respect to P^(-|£ n ) and using (4.15) where N = [£ n ] entails a), 
since log£ n = o(logn). 

The proof of b) is similar to that of a). In this setting, the rate r n {-) (see (2.1)) can 
be written as r n (x) = (log n/n) an ^ x > for x in I n (for n large enough) where a n (xo) = 
s/(l + 2s + j3) and a n {x) > s/(l + 2s + (5) for x G I n - {x }. We define 



Xk + l 



x k + n- a ^/ s for k G {—N, . . . , — 1} 
x k + n - a ^+^/ s for k G {0, . . . , N}, 



where N := [£ n ). All the points fit in I n , since \x- N -x N \ ^ J^-n^n n ~ min{an(xk) ' an{xk+l))/s 
2(£ n /n) 1 / {l+2s+ P\ We consider the bandwidths 

h k := (log£ n /n) a ^/ s , 
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and the intervals 5k = [xk — hk, %k + hk\- We keep the same definition (4.17) for f n . Since 
xq is a local extremum of r n (-), we have in the same way as in the proof of a) that 

dog l n \°<n(x k ) 



sup r n {x) 1 \f n (x) - f(x)\< 



max 



-N^k^-l V logn 

'log £ n \<^n(x k+1 ) 



max 



0<fc^7V-i V logn 
hence 



(l + oog^r 172 ^), 



E Ul [ sup r n {x)- l \Ux) - f(x)\] < f]^lY /{l+2s+(5) = o(1)) 
which concludes the proof of Proposition 1. □ 
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